Agreement Constraints for Statistical Machine Translation into German
نویسندگان
چکیده
Languages with rich inflectional morphology pose a difficult challenge for statistical machine translation. To address the problem of morphologically inconsistent output, we add unification-based constraints to the target-side of a string-to-tree model. By integrating constraint evaluation into the decoding process, implausible hypotheses can be penalised or filtered out during search. We use a simple heuristic process to extract agreement constraints for German and test our approach on an English-German system trained on WMT data, achieving a small improvement in translation accuracy as measured by BLEU.
منابع مشابه
Modeling verbal inflection for English to German SMT
German verbal inflection is frequently wrong in standard statistical machine translation approaches. German verbs agree with subjects in person and number, and they bear information about mood and tense. For subject–verb agreement, we parse German MT output to identify subject–verb pairs and ensure that the verb agrees with the subject. We show that this approach improves subject-verb agreement...
متن کاملGerman Compounds in Factored Statistical Machine Translation
An empirical method for splitting German compounds is explored by varying it in a number of ways to investigate the consequences for factored statistical machine translation between English and German in both directions. Compound splitting is incorporated into translation in a preprocessing step, performed on training data and on German translation input. For translation into German, compounds ...
متن کاملComparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation
This paper describes a new method to compare reordering constraints for Statistical Machine Translation. We investigate the best possible (oracle) BLEU score achievable under different reordering constraints. Using dynamic programming, we efficiently find a reordering that approximates the highest attainable BLEU score given a reference and a set of reordering constraints. We present an empiric...
متن کاملPreference Grammars and Soft Syntactic Constraints for GHKM Syntax-based Statistical Machine Translation
In this work, we investigate the effectiveness of two techniques for a featurebased integration of syntactic information into GHKM string-to-tree statistical machine translation (Galley et al., 2004): (1.) Preference grammars on the target language side promote syntactic wellformedness during decoding while also allowing for derivations that are not linguistically motivated (as in hierarchical ...
متن کاملStatistical Machine Translation of German Compound Words
German compound words pose special problems to statistical machine translation systems: the occurence of each of the components in the training data is not sufficient for successful translation. Even if the compound itself has been seen during training, the system may not be capable of translating it properly into two or more words. If German is the target language, the system might generate on...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011